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PROBLEM STATEMENT 

In scheduling a set of tasks, it is often not 
known with certainty how long a given event 
will take. We call this duration uncertainty. 
For example, as part of the task of making a 
telescope observation, the telescope must be 
accurately centered on a star. The time 
required to perform this subtask cannot be 
accurately predicted, since it depends on 
factors which vary from execution to 
execution (e.g., the position of the telescope 
at the start of the execution of this task). 

Duration uncertainty is a primary obstacle 
to the successful completion of a schedule. If 
a duration of one task is longer than 
expected, the remaining tasks are delayed. 
The delay may result in the abandonment of 
the schedule itself, a phenomenon known as 
schedule breakage. One response to schedule 
breakage is on-line, dynamic rescheduling. A 
more recent alternative is called proactive 
rescheduling [2]. This method uses 
statistical data about the durations of events 
in order to anticipate the locations in the 
schedule where breakage is likely prior to 
the execution of the schedule. It generates 
alternative schedules at such sensitive 
points, which can be then applied by the 


scheduler at execution time, without the 
delay incurred by dynamic rescheduling. 

This paper proposes a technique for 
making proactive error management more 
effective. The technique is based on applying 
a similarity-based method of clustering to the 
problem of identifying similar events in a set 
of events. The remainder of this paper 
consists of a discussion of the following: 

1. The intuitions underlying the technique; 

2. The way in which clustering techniques 
from the AI literature can be applied to 
the problem of managing duration 
uncertainty in scheduling; 

3. The requisite assumptions about the 
domain for applying the technique; and 

4. An implementation strategy. 

INTUITIONS 

The set of events under consideration have 
occurrences which need to be scheduled. 

The goal is to find an ordering of these 
occurrences which minimizes the amount of 
expected duration uncertainty associated 
with each. The knowledge used to find the 
ordering comes from observations of 
repeated past occurrences of the same 
events. Figure 1 represents a repeated 
occurrence of an event E. E recurs 4 times 
over a stretch of time. Duration uncertainty 
is depicted visually as the difference in the 
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Figure 1: A repeating event 

lengths of each line representing a single 
occurrence. We assume that the events 
under consideration all have the tendency to 
exhibit duration uncertainty. 

The heuristic being formalized here is that 
duration uncertainty can often be reduced 
by assigning an event in such a way that it 
is in temporal proximity to a similar event. 

A mundane example will illustrate. Suppose 
I am scheduling my daily household chores. 

I find that I must complete three tasks: 
clean the kitchen ( K ), clean the bathroom 
( B ) and work in the garden (G). I can do 
these in any order; my main constraint is to 
finish all three within a certain time frame. 
One is clearly led to a plan to perform K 
and B together, either before or after G. 
Why? The tasks are similar, either in that 
they are both cleaning tasks, or perhaps also 
because they are indoor tasks. 

How does the act of scheduling similar 
events in close temporal proximity lead to a 
reduction of duration uncertainty? 
Intuitively, actions are sometimes similar 
because they share a number of stages. For 
example, any cleaning room action consists 
of a preparation stage consisting of getting 
the mop or broom, getting floor cleaner, 
water, bucket, etc. If I perform the cleaning 
room actions together, say I\ — ► B (clean 
the kitchen followed by clean the bathroom), 
the preparation stage of B will not be 
required (or be simplified). Since the 
duration of any action is the sum of the 
durations of its stages, the duration 
uncertainty of the whole will be a similar 
function of the duration uncertainty of the 
different stages. It follows that I should be 
able to more accurately predict how long the 
bathroom cleaning will take when preceded 
by the kitchen cleaning action than I could 


Figure 2: Pairing an event with a similar event 

predict its duration in isolation, or when 
preceded by a dissimilar event. This 
conclusion is justified by noting that the 
preparation stage, in such a situation, does 
not exist; hence, trivially, there is no 
uncertainty associated with it, which 
reduces the uncertainty of the whole event. 
Graphically, this can be represented as in 
Figure 2. This figure represents the expected 
durations of kitchen events when paired 
with the similar, bathroom cleaning event. 
On the other hand, if paired with a 
dissimilar event (e.g. gardening), one would 
expect K to behave as in Figure 1. 

In ordering mundane events, we implicitly 
bring to bear the ability to apply concepts 
which cluster events into similarity classes. 
This paper addresses the same problem 
when such a priori conceptual knowledge 
about a domain is lacking. For example, in 
the telescope scheduling domain, it may be 
difficult or impossible to classify a priori 
whether two tasks to be scheduled are 
similar or not. The main contribution of this 
paper is to suggest that there is a posteriori 
knowledge (knowledge gained from 
experience) that can be used to infer the 
similarity of events. 

COMPUTATIONAL MODEL 

The computational problem to be solved 
can be stated as follows: given a set E of k 
events, find an ordering 
Ei E 2 Ek o{ all the elements in 

E which minimizes the expected duration 
uncertainty over all members of E. The 
previous section justified the intuition that 
some orderings of events will exhibit less 
duration uncertainty than others. In this 
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section, a technique for finding these 
preferred orderings will be presented. 

Similarity Based On Relative 
Durations of Events 

Based on observations in the previous 
section, the notion of similarity between two 
events e and e' can be induced from 
observations of the durations of each event 
when they are placed in close temporal 
proximity. 

Definition 1 The relative duration of e 
with respect to e! (rd(e,e f )) is the duration 
of e when e immediately follows e\ The 
relative average duration of an event e with 
respect to an event e* is the average duration 
of e when immediately followed by e f , over a 
set of occurrences of e and e'. 

rd(e,e') can be viewed as a discrete random 
variable, associating a duration with the 
outcome of pairing the two events. Let 
<j T -d( e ,e') denote the standard deviation of 
rd(e,e'). It is then possible to define the 
notion of relative similarity between triples 
of events ej, e2, cy. 

Definition 2 e\ is at least as similar to tv 
as to 63 if 0’ r( i{ ei e2 ) £ ( 7 rd(ej,C3)* 

An absolute concept of similarity can be 
defined when a similarity threshold is 
postulated. Let 0 be such a threshold. Then: 

Definition 3 Let e and e ' be events. Then e 
is similar to e f if <7 r d( e ,e') < 

Any similarity relation is reflexive, 
symmetric, and intransitive. The claim here 
is that comparing the value of a r ^( eif e 2 ) to a 
threshold can be viewed as applying a 
similarity relation. Clearly, reflexivity and 
intransitivity are satisfied. By definition, 
symmetry implies that if <r ra f( e>e /) < 0, then 
0Yd(e',e) < Reflections from intuition 
should make this assumption plausible. 
Recall that the postulated reason for 
reduction of duration uncertainty when 
events are paired to similar events is that 
they share a stage, which is eliminated or 



simplified when the events are paired 
together, Clearly, the ordering of the pairing 
is irrelevant. For example, whether K — ► B 
or B — > /C, the duration uncertainty of the 
later event will be reduced. Hence, it is 
reasonable to assume that similarity, defined 
in the previous definition, is symmetric. 

Relation to Clustering Methods 

In order to reduce duration uncertainty in 
an error management system for scheduling, 
events should be ordered in a way that 
similar events are clustered. The 
similarity-based clustering method [ 3 ] is a 
weak AI method which can be employed to 
generate efficient orderings. The 
computational problem of interest here can 
be viewed a & an instance of one-dimensional 
clustering. For such a problem, the goal is to 
reduce the number of distinct values of a set 
of variables by identifying near-equivalence 
classes of values based on similarity. To 
briefly illustrate the technique of clustering, 
we introduce a data structure called a 
cr-graph: 

Definition 4 A a-graph is a weighted 
directed graph with the following 
characteristics. Each vertex is labeled by one 
of the elements in a set E . Each directed 
edge (e^ej) between source e, and target 
node ej is labeled with a value representing 
the degree of similarity between e t and ej. 

To illustrate, consider a slightly more 
complex mundane example. Now there are 
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five events, including A', B and G, as before, 
but also including the tasks wash car (G) 
and go to store (A). An incomplete a - graph 
for this set of events is found in Figure 3. 
Here, the lower the value on an arc, the 
greater the degree of similarity between the 
two events. 

Clustering techniques are traditionally 
used for automating concept formation. One 
clustering method (called agglomeration ), 
fuses entities to form groupings based on the 
threshold of minimum similarity. The fusion 
process stops when all values exceed the 
threshold. For example, if the threshold is 
assumed to be 2, the result of the 
agglomerative process applied to the 
example would fuse B and K into a cluster. 
For our purposes, however, clustering is a 
means to an end, viz., to generate an 
ordering of events which reduces the amount 
of duration uncertainty with which a 
proactive scheduling error manager needs to 
contend. The following section describes 
how similarity-based clustering can be 
implemented for this purpose. 

Implementation and Intended Use 

The procedure for generating efficient 
orderings of events based on relative 
durations is intended to be used as a 
preprocessing stage in a proactive error 
management system for scheduling. The 
stage can be viewed as one that deletes from 
the set of possible orderings those which 
exhibit the most duration uncertainty. 

Assume as input a set E of k events. The 
set E has been executed up to m times in 
some or all of the k\ permutations of the 
orderings of the events in E. Assume an 
ordering of these permutations and 
executions. Let rd(E{, Ej)[p,q] represent the 
duration of A, when immediately followed 
by Ej on the p th occurrence of the q th 
permutation of A; thus 1 < p < m and 
1 < q < k\. This yields a set of 
0(fc![m(fc — 1)]) values of rd(Ei, Ej)[p,q] for 


each pair A,-,Aj € E. From this data, an 
ordering of a set E of events which 
minimizes duration uncertainty is based on 
the following steps: 

1. For each E , in E, compute the mean of 
the set {rd(E t , Ej)[p,q] : 1 < p < m, 1 < 
q < &!}, and a rd ( E ^ Ej ), for each pairing 
of E{ with other Ej € A; 

2. Form a cr-graph with A the set of 
vertices and for each pair A,, Ej in A, 
there is an arc labeled with the value of 
a Td(E„E,)\ and 

3. Apply an all-pairs shortest-path 
algorithm [1], such as Floyd- Warshall, to 
generate an ordering of the events. 

For example, assume that Figure 3 
represents the result of completing step 2 in 
the procedure. Thus, the labels on the arcs 
represent the standard deviations of the 
relative durations of the event occurrences 
connected by the arc. If the claims made in 
this paper are plausible, then such values 
would be the kind expected, since they 
reflect the intuitive degree of similarity 
among the events. Then, the result of 
applying step three would yield 

B—*K—*C—*S-+G 

as well as other orderings which are minimal 
with respect to duration uncertainty. 

An example of a proactive scheduling 
system which might benefit from the 
account presented here is the Just-In-Case 
( JIC ) error management technique 
described in [2]. This technique analyzes a 
schedule of telescope observations for 
possible execution breaks. For the break 
point with the highest probability of 
occurrence, the system forms a contingent 
alternative schedule. JIC utilizes duration 
uncertainty measures to calculate the 
possible schedule break points. As a 
preprocessing stage to the error management 
procedure, the three stage method presented 
in this section could be applied to 
discriminate among different orderings of the 
events, selecting the ones which minimize 
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duration uncertainty. This would reduce the 
amount of anticipated break points with 
which the error manager has to contend. 

ASSUMPTIONS AND LIMITATIONS 

To be of optimal benefit for its intended 
use, the events to be analyzed by the method 
should possess the following properties: 

1. The events in E should be causally 
independent; this means at least that: 

• No occurrence Ei in E prohibits the 
execution of any other Ey, and 

• No occurrence E t presupposes the 
execution of some other Ej; 

and 

2. Each of the events in E has the 
tendency to exhibit duration 
uncertainty; this means that, considered 
in isolation, the standard deviation of 
the duration of each event is high. 

Even with these minimum assumptions, 
a rd(E,,Ej) is a coarse measure of event 
similarity. For example, assume E, consists 
of the stages A, B and C, and Ej consists of 
A,E, and F. Assume that the duration 
uncertainty of Ej is caused completely by 
stage F. Then, the approach proposed here 
would fail to recognize that the two events 
are similar (in the sense of sharing a 
common stage A), since Ej would not 
demonstrate a reduction of duration 
uncertainty when paired with E t . In such a 
case, it would be useful to view the absolute 
reduction in mean duration as evidence for 
its similarity to Ei. That is, since Ej shares 
a stage with E t , its pairing with Ei should 
result in a reduction of the time it takes to 
execute. Hence, it may be the case that 
both mean duration and standard deviation 
should be viewed as the measure of 
similarity. This could be easily added to the 
implementation by including mean duration 
as part of the labels on the arcs of the 
tr-graph. The addition would imply a two 
dimensional description space for the events, 


and a similarity concept based on a vector of 
attributes. 

There may be other forms of causal 
interaction which would make the ordering 
produced by this procedure less preferred 
than others. 1 Consider for example events 
E{ and Ej again. Perhaps the pairing 
Ei — » Ej would result in a reduction of the 
standard deviation of the duration of j Ej, 
and hence be preferred by the proposed 
model. However, it is possible that this 
pairing would increase the absolute duration 
of Ej. 

CONCLUSION 

This paper has offered an approach for 
aiding proactive error management 
techniques for scheduling. The idea is to use 
statistical temporal information about event 
occurrences to induce similarities among 
these occurrences, when conceptual 
information about the same events is 
unavailable. Pairing similar events in close 
temporal proximity can often reduce the 
uncertainty in the expected duration of the 
events. This leads to the potential for a 
reduction in the amount of rescheduling 
required by the proactive error manager. 
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